What Do Nighttime Lights Tell Us About a Place?

Humans are the only species on the planet that can produce enough energy to generate electromagnetic radiation from the visible spectrum that is visible from space. We create a tremendous amount of this energy by lighting the spaces we inhabit, work at, and travel through. These night lights are a distinctly human part of our planet; what, if anything, can they actually inform about humanity as a whole? Over the course of this lesson, we will be assessing if imagery captured by the VIIRS day/night band can provide us with a more nuanced understanding of the human population found at any given location. To do this we will be performing a series of spatial analyses to test the correlation between monthly averages of nighttime radiance captured by VIIRS and social economic factors conveyed by census data. We will be performing this evaluation over three counties in Texas, but the workflow will be possible anywhere census data is found, as VIIRS provides daily night light images of the entire Earth.

The Data

We are relying on a few different datasets for this tutorial. The first and most important one is monthly composites of nighttime radiance derived from the VIIRS day night band imagery and compiled by the Colorado School of Mines Earth Observation Group.

We’re also using county level data from Natural Earth.

Lastly, we will use census tract data of the American Community Survey 2015, which can be retrieved using the tidycensus R library.

Examing the Monthly Composites

You can find a more comprehensive description of the VIIRS monthly composites by opening
/data/describeVIIRSData.html
within your ‘intermediateGeospatialR’ folder.

Data Structure

The monthly composites are delivered at the continental scale. Each month has two images associated with it. The first image contains an average surface radiance value for that given location. The second image contains the number of observations across the month that were used to generate the average monthly value presented in the radiance image.

Distribution of Radiance Values

In the example provided in the describeVIIRSData.html, we can see that the distribution of radiance values is right skewed. This means that the majority of the observations are at or near zero and there are a few locations that significantly brighter. The is important information to understand before starting the analysis as most of the observations will tell us little more then it is dark at that location.

A Bright Place

The Lux, or Sky Beam, on top of the Luxor Hotel in Vegas is the brightest object on earth at 42.3 billions candela. This beam is produced by 39,7000 watt lamps. When all lamps are operating, the interior temperature of the room in which they are contained raises to approximately 300 degrees Fahrenheit.

Number of Observations

The VIIRS day/night band cannot capture data from under clouds. Therefore, not every night provides information regarding the surface radiance at a given location. When working with the monthly composites, it is essential to consider how many observations went into the calculation of a mean radiance. This is particularly important with this sensor, as the view angle from which the image was captured affects the observed radiance (think about a building blocking light in a particular direction). We will be evaluating this effect in the second part of the lesson.

Libraries of Interest

Processing Spatial Data

We are going to be working with sf and terra to process the imagery and polygons.

Census Data

tidycensus requires a personal access token to utilize the API. As such, we are not going to be pulling data from tidycensus as part of this lesson. However, the code showing how the data was pulled is provided as supporting material ( src/section1/prepWorkshopFilePrep.R) if this is something you need to do later on in your work.

Visualizing Spatial Data

We will be using the tmap library for quick visualizations of these spatial features.

Data Wrangling and Summarizing

dplyr, tidyr and stringr are great packages for efficient data wrangling, and are a part of the tidyverse, a collection of R packages designed for data science.

Installing Libraries

source("installPackages.R")

We now have a function in our environment called packageLoad(). With this function we can supply it with all the packages we need for the lesson, and for each one it checks if it is currently installed, installs it if it is not, and then loads all packages into our session memory.

# load required libraries 
packageLoad(c("sf", "terra", "dplyr", "tidyr", "stringr", "tmap"))

Since this lesson is in an R Project, your working directory should be automatically set to the correct path (the directory your .Rproj file is in). Double check that it is set to something that looks like: ~/Desktop/R_SC_Spatial/, depending where you saved the folder from GitHub.

Check in: Are all the libraries loaded?


Answer

You could still be waiting for installations, which is fine.

If you are having issues with a specific package connect with a helper.

If everything seemed to work but you want to double check, use the base R function below to see what’s currently loaded into your session memory.

(.packages())


Grabbing the Data

Imagery

We will pull our full list of images using the list.files() function. This function grabs all the files within a specific directory. We can filter for specific files based on a text pattern. We will use .tif to grab all the rasters. It’s essential here that we also indicate full names equals TRUE. This will provide the full path to the images, which we need to read them in.

# night light imagery 
images <- list.files(path = "intermediateGeospatialR/data/nightLights",
                       pattern = ".tif", 
                       full.names = TRUE)

Currently our images are just files paths, which is fine because we don’t need spatial objects just yet. There are two types of images for each month, count and radiance values. For this first lesson we just want the ‘counts’ data, which represents the number of observation days within that month.

# use the 'grep' function to filter out just the 'counts' image files
counts <- images[grepl(pattern = "_counts", x = images)]

Work Break

How would you select the remaining images in the file folder excluding the “_counts” images (i.e., just the radiance files)? We will be using these radiance files for the rest of the lesson.


Answer

The are multiple ways to approach this.

# use the opposite operator to grab the other half. '!' filters anything that does NOT equal that pattern
radiance <- images[!grepl(pattern = "_counts", x = images)]

# select all paths that are not current in the counts file 
radiance <- images[!images %in% counts]

# or select a pattern unique to the image set you need to grab
radiance <- images[grepl(pattern = "arc.tif", x = images)]


County Data

The county data has been prepped for us so we can read it in by pointing to the .shp file which it is stored in.

## county locations
counties <- sf::read_sf(dsn = "intermediateGeospatialR/data/counties/countyTex.shp",quiet =TRUE)
# head(counties)

This data contains records for three counties in Texas.

  • Bexar county is home to San Antonio.
  • Harris county is home to Houston.
  • Brazoria county is home to a few smaller municipalities and multiple coastal wildlife refugees, but is primarily rural.

Census Data

The census data for these three counties has be prepped for this lesson. Reference the file prepWorkshopFilePrep.R to see how to use the tidycensus library to call census data directly through R.

The census data was saved as a spatial .shp file (the census attributes were tied to each census tract within our counties of interest), so we can read it in like we did the county data.

## census tract data 
censusT <- sf::read_sf(dsn = "intermediateGeospatialR/data/census/ageAndPoverty.shp", quiet =TRUE)
# head(censusT)

Initial Data Visualization

With everything loaded, let’s plot the data and take a look at each one.

tmap::tmap_mode("view") # sets up the interactive map 
# we need to read the raster in to visualize 
tmap::qtm(terra::rast(radiance[1]))

Because the images object is just a vector of file paths we need to call the rast function to read in one of the images as a raster file to visual it. We do not need to store this as an object at this point.

The map is not exciting because most of the state is dark and the tmap library forces a generalization (reclassification) of the data to visual large extents. The net effect of this is a very boring map. We can examine the specific values of the raster to be sure the data is valid.

Nesting Functions to Visualize All Unique Values

# read in raster, grab values, determine unique values, convert to df for visualization purposes, View the dataframe for visual inspection 
View(data.frame(unique(values(terra::rast(radiance[1])))))

There is data there, so let’s move on.

Visualize the county data

# there are three counties of interest
tmap::qtm(counties)


Visualize Census Tract data

We are working at the census tract level. Each census track has on average 4,000 people in it. The population of Houston is about 2.3 million, so we can expect a lot of rows in this dataset. Add to that the fact that each census tract is likely to have a fairly complex geometry, which requires storing a lot of coordinates. So, lets understand more about this dataset before we try to plot it.

# By printing the object we can see some metadata and the first 10 rows in the console
censusT
## Simple feature collection with 2406 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -98.80655 ymin: 28.82557 xmax: -94.9085 ymax: 30.1705
## Geodetic CRS:  NAD83
## # A tibble: 2,406 x 6
##    GEOID       NAME      variable estimate    moe                       geometry
##    <chr>       <chr>     <chr>       <dbl>  <dbl>             <MULTIPOLYGON [°]>
##  1 48029190601 Census T~ B01002_~     31.3    3.7 (((-98.52602 29.46795, -98.52~
##  2 48029190601 Census T~ B17001_~    617    266   (((-98.52602 29.46795, -98.52~
##  3 48201311900 Census T~ B01002_~     35.7    3.5 (((-95.33319 29.72589, -95.32~
##  4 48201311900 Census T~ B17001_~    403    166   (((-95.33319 29.72589, -95.32~
##  5 48201450200 Census T~ B01002_~     39.6    2.4 (((-95.59029 29.78492, -95.57~
##  6 48201450200 Census T~ B17001_~     63     50   (((-95.59029 29.78492, -95.57~
##  7 48201450400 Census T~ B01002_~     28.8    8.7 (((-95.62377 29.78472, -95.62~
##  8 48201450400 Census T~ B17001_~    830   1069   (((-95.62377 29.78472, -95.62~
##  9 48201554502 Census T~ B01002_~     40.6    2.5 (((-95.65325 29.99056, -95.65~
## 10 48201554502 Census T~ B17001_~    203    170   (((-95.65325 29.99056, -95.65~
## # ... with 2,396 more rows
#the 'str' base R function gives us more details on the structure of the dataset
str(censusT)
## sf [2,406 x 6] (S3: sf/tbl_df/tbl/data.frame)
##  $ GEOID   : chr [1:2406] "48029190601" "48029190601" "48201311900" "48201311900" ...
##  $ NAME    : chr [1:2406] "Census Tract 1906.01, Bexar County, Texas" "Census Tract 1906.01, Bexar County, Texas" "Census Tract 3119, Harris County, Texas" "Census Tract 3119, Harris County, Texas" ...
##  $ variable: chr [1:2406] "B01002_001" "B17001_002" "B01002_001" "B17001_002" ...
##  $ estimate: num [1:2406] 31.3 617 35.7 403 39.6 63 28.8 830 40.6 203 ...
##  $ moe     : num [1:2406] 3.7 266 3.5 166 2.4 ...
##  $ geometry:sfc_MULTIPOLYGON of length 2406; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:14, 1:2] -98.5 -98.5 -98.5 -98.5 -98.5 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:5] "GEOID" "NAME" "variable" "estimate" ...

You can also inspect the spatial object as a dataframe with View, which allows you to sort and filter the data

View(censusT)

Let’s plot these census tracts now. Note that you can click on a census tract to view a pop-up of its attributes.

tmap::qtm(censusT)

Coordinate Systems

Probably the most important part of working with spatial data is the coordinate reference system (CRS) that is used. In order to analyze and visualize spatial data, all objects must be in the exact same CRS.

Therefore, before we move on lets check the CRS of our raster and vector files: The data looks good, so let’s check the details.

# All the raster images were created using the same methodology so we can check the CRS of just one of them 
terra::rast(images[1])
## class       : SpatRaster 
## dimensions  : 650, 798, 1  (nrow, ncol, nlyr)
## resolution  : 0.01666667, 0.01666667  (x, y)
## extent      : -106.7271, -93.42708, 25.75208, 36.58542  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : april_10arc.tif 
## name        : april_10arc 
## min value   :  0.08749756 
## max value   :    4220.476
#we will use this rasters' propoerties later so let's save it as a variable
temp <- terra::rast(images[1])

This dataset is in WGS84 (longitude/latitude).

# print to view attributes 
censusT
## Simple feature collection with 2406 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -98.80655 ymin: 28.82557 xmax: -94.9085 ymax: 30.1705
## Geodetic CRS:  NAD83
## # A tibble: 2,406 x 6
##    GEOID       NAME      variable estimate    moe                       geometry
##    <chr>       <chr>     <chr>       <dbl>  <dbl>             <MULTIPOLYGON [°]>
##  1 48029190601 Census T~ B01002_~     31.3    3.7 (((-98.52602 29.46795, -98.52~
##  2 48029190601 Census T~ B17001_~    617    266   (((-98.52602 29.46795, -98.52~
##  3 48201311900 Census T~ B01002_~     35.7    3.5 (((-95.33319 29.72589, -95.32~
##  4 48201311900 Census T~ B17001_~    403    166   (((-95.33319 29.72589, -95.32~
##  5 48201450200 Census T~ B01002_~     39.6    2.4 (((-95.59029 29.78492, -95.57~
##  6 48201450200 Census T~ B17001_~     63     50   (((-95.59029 29.78492, -95.57~
##  7 48201450400 Census T~ B01002_~     28.8    8.7 (((-95.62377 29.78472, -95.62~
##  8 48201450400 Census T~ B17001_~    830   1069   (((-95.62377 29.78472, -95.62~
##  9 48201554502 Census T~ B01002_~     40.6    2.5 (((-95.65325 29.99056, -95.65~
## 10 48201554502 Census T~ B17001_~    203    170   (((-95.65325 29.99056, -95.65~
## # ... with 2,396 more rows

This dataset is in NAD83.

# print to view attributes
counties
## Simple feature collection with 3 features and 17 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -98.80655 ymin: 28.76484 xmax: -94.90849 ymax: 30.17061
## Geodetic CRS:  NAD83
## # A tibble: 3 x 18
##   STATEFP COUNTYFP COUNTYNS GEOID NAME     NAMELSAD    LSAD  CLASSFP MTFCC CSAFP
##   <chr>   <chr>    <chr>    <chr> <chr>    <chr>       <chr> <chr>   <chr> <chr>
## 1 48      029      01383800 48029 Bexar    Bexar Coun~ 06    H1      G4020 <NA> 
## 2 48      201      01383886 48201 Harris   Harris Cou~ 06    H1      G4020 288  
## 3 48      039      01383805 48039 Brazoria Brazoria C~ 06    H1      G4020 288  
## # ... with 8 more variables: CBSAFP <chr>, METDIVFP <chr>, FUNCSTAT <chr>,
## #   ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
## #   geometry <POLYGON [°]>

This dataset is also in NAD83.

Matching the CRS

The vector and raster datasets are in a different CRS: WGS84 and NAD83. We need to correct that before we do any spatial analysis.

What coordinate reference system you choose to use is up to you, but for this lesson we are going to take our two sf features and project them to WGS84. The reason for this is twofold. First, there are fewer features to reproject going in this direction. Second, when you’re reprojecting raster data, there is the potential for resampling to occur.

It’s probably not going to be an issue with WGS84 to NAD83 in the United States, but since the raster data is our primary dataset we want to avoid altering it any more than we need to.

To transform vector data, we can use the st_transform() function, supplying it with the CRS of the raster files which can be retrieved using the crs() function

# reproject datasets to match the CRS of our raster object 
counties <- sf::st_transform(x = counties, crs = crs(temp))
censusT <- sf::st_transform(x = censusT, crs = crs(temp))

Test the transformation

Let’s double-check that all our vector and raster data are in the same CRS so we can move forward. We can use the st_crs() function from the sf package, which can also retrieve CRS info from terra raster objects

# test for CRS match 
st_crs(counties) == st_crs(temp)
## [1] TRUE
st_crs(censusT) == st_crs(temp)
## [1] TRUE

Great, both return ‘TRUE’.

With the reprojection taken care of, let’s drop the ‘temp’ raster object to keep our working directory clean and memory open.

# remove the raster object to keep the working directory clean and free up working memory  
rm(temp)

Process the Image to the County Level

The imagery is still at the state level. We need to process it down to the county level so that we can associate it with our census tract data. Working with the smallest geographic extent you can is always going to save you time and hopefully headaches as it is less computationally intensive.

Vector data with terra

So far we have just been using properties of each raster and vector file to perform operations, however to conduct spatial analysis there is one more step we need to take to make these two types of spatial data compatible. The terra package uses a special form of vector data called SpatVect objects instead of sf objects. Luckily, they have made it relatively easy to convert sf objects to SpatVect objects using the vect() function. We will do this first before we build our workflow.

# convert sf objects to spatvect
counties <- vect(counties)
censusT <- vect(censusT)

A Note on Creating a Workflow

The truth is, when you pick up a new workflow (like this one), you’re not going to get it right the first time.

So, we recommend breaking down the process to the minimum number of features to get that workflow working first, then you can start thinking about iterating the process.

Start Small

### figure out the process with one feature

# create a raster from one image file path 
r1 <- terra::rast(radiance[1])

### crop the image to one of the counties

# select county of interest (each row is a unique county)
c1 <- counties[1, ]

# crop the raster to the county extent 
r2 <- terra::crop(x = r1, y = c1)

# mask the image (only keep cells that overlap with the county polygon)
r3 <- terra::mask(x = r2, mask = c1)

# take a look to see if this worked 
qtm(r3)

The basic workflow consists of indexing the images and the counties. We then call the crop() and mask() functions from the terra library. The result is a raster that has been reduced to the boundary of the county of interest.


Streamline the Workflow

Now that we know that process works, let’s try to streamline this process a little bit so that we are not creating so many variables in our environment. We will utilize dplyr piping structure. The %>% pipe is a custom operator that takes the output of one function and places it as an input into another function. This operator allows you to connect functions from output to input, output to input, without creating intermediate variables, thus resulting in more efficient code.


Answer
### select county of interest 
c1 <- counties[1, ]
## read, crop and mask the first raster image 
r1 <- terra::rast(radiance[1])%>%
  terra::crop(y = c1)%>%
  terra::mask(mask = c1)
qtm(r1)
We’ve accomplished the same result, but this time we’ve only declared two variables. This adds clarity to our code and is generally a good practice to follow.


Make it a Function

The workflow we’ve created above requires two inputs and produces a single output. To make this code more transferable, we can build it out as a function. This optimizes the code for reuse and will allow us to efficiently apply the process to multiple features.


Answer
clipMask <- function(path, extent){
  ## path is a full file path to a raster object
  ## extent is a spatial object (in SpatVect format) which will be used to clip and mask the raster
  ## returns a raster object that is clipped and masked to the extent object 
  r1 <- terra::rast(path) %>%
  terra::crop(y = extent) %>%
  terra::mask(mask = extent)
return(r1)
}

When we write up the function, we’re inherently abstracting some of the components. So we’re no longer indexing from the county object to pull a specific county. Instead, we’re just saying we want an extent object and we want a raster. Counter this abstraction by putting documentation right at the start of your function so people can understand what it is you’re asking for.


Applying the Function to All Images

In the data folder /intermediateGeospatialR/data/nightLights we have radiance imagery for 12 months in 2019. We also have three counties of interest that we’re going to be working in. If we want radiance for each county for each month, we need to create 36 images.

We could hard code to call our function 36 times.

im1 <- clipMask(path = radiance[1], extent = counties[1,])
im2 <- clipMask(path = radiance[2], extent = counties[1,])
## so on an so one

While this will definitely work, it is very repetitive. If we had hundreds of counties instead of 3 this would take forever to write. This is where iterating with ‘for loops’ comes into play.

Loop the Loop

We can define all 33 features much more efficiently by utilizing for loops.

For loops (think “for each feature in the list”) are a great tool for doing the same operation multiple times.

When developing a for loop operation, it is helpful to write out the general structure first and then assign your variables to test the process before you let the loop run. If the loop works once, it should (ideally) work every time. For this example we expect a single image to return from each iteration of the loop.

### outline the structure you want 

# for each county - return an image for each month
## select the county of interest
# for each raster
## select a specific month 
## clipMask()
## save the processed image 

Take a moment to fill in the code using the above outline.


Answer
# for each county - return an image for each month
for(i in seq_along(counties)){
  c1 <- counties[i,] # select a specific county 
  for(j in seq_along(radiance)){
    ## pull a specific month's image 
    p1 <- radiance[j]
    ## clipMask()
    r2 <- clipMask(path = p1, extent = c1)
    ## save the processed image 
    # we will come back to this
  }
}

So it looks okay, but let’s test it on just a single iteration before we run the entire loop on all features.

### define i and j so the loop only runs for the first features 
i <- 1
j <- 1
###
#comment out the loop part since we are just running once 
#for(i in seq_along(counties$STATEFP)){
  c1 <- counties[i,] # select a specific county 
#  for(j in seq_along(images)){
    ## pull a specific month's image 
    # p1 <- images[j]
    ## clipMask()
    r2 <- clipMask(path = radiance[j], extent = c1)
    ## save the processed image 
    # we will come back to this
#  }
#}
### print the output to check results 
r2
## class       : SpatRaster 
## dimensions  : 39, 42, 1  (nrow, ncol, nlyr)
## resolution  : 0.01666667, 0.01666667  (x, y)
## extent      : -98.81041, -98.11041, 29.11875, 29.76875  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : memory 
## name        : april_10arc 
## min value   :    0.425001 
## max value   :    116.5439


The slimmed down loop worked as we expected. It returned a processed image, but we haven’t figured out what to do with the processed image yet.

Saving the Output From the Loop

If we were to run the looping operation as is, it would process the 36 images, but we would end up with only a single feature at the end. That is because the variable r2 is being rewritten every time the loop repeats itself. We could get crafty and store all these images in memory using a list(), or we can write out the files and pull them back in as needed. There are pros and cons for both, but today we will be writing out the imagery files.


See the code below if you wanted to store it all in memory as a list.

# create empty list
processImages <- list()

for(i in seq_along(counties)){
  c1 <- counties[i,] # select a specific county 
  list1 <- list()
  for(j in seq_along(radiance)){
    ## pull a specific month's image
    # p1 <- images[j]
    ## clipMask()
    list1[[j]]<- clipMask(path = radiance[j], extent = c1)
  }
  processImages[[i]] <- list1
  rm(list1)
}
processImages

Growing a list like this is pretty inefficient and indexing can be a lot to keep track of. Still, there are cases when working in memory is the best option.

Develop a File Structure

If we want to save the processed images, we need to first create a directory in which we can store all the images for each county. We can do this using a similar loop structure as the one above. By concatenating characters using the paste0() function we can build out paths for new files.

# create new file directories for processed imagery 
for(i in unique(counties$NAME)){ # looping over names not indices 
  print(i)
  location <- paste0("data/nightLights/", as.character(i)) # create path for file 
  if(!file.exists(location)){ # test if this folder already exists 
    dir.create(path = location) # if it doesn't exist, create the folder 
  }
}

With the file structure in place, we can run the process and write out the rasters by constructing a unique path and file name for a given month at a specific location.

We can use names(r1) in this case because these images were named when they were prepped for this lesson. You can check out how this was done in the script prepWorkshopFilePrep.R. That process is still dependent on initial file management actions shown here. All this is to say it is important to think about your folder structure when creating material as it can have long lasting effects on your workflow.

# for each county - return an image for each month
for(i in seq_along(counties)){
  print(i)
  c1 <- counties[i,] # select a specific county 
  for(j in seq_along(radiance)){
    ## pull a specific month's image 
    p1 <- radiance[j]
    ## clipMask()
    r2 <- clipMask(path = p1, extent = c1)
    ## save the processed image 
    countyName <- as.character(c1$NAME) # define county name
    rasterName <- names(r2) # define raster name
    file <- paste0("intermediateGeospatialR/data/nightLights/",countyName, "/",rasterName,".tif") # concatenate features to create a path
    terra::writeRaster(x = r2 , 
                        filename = file,
                        overwrite = TRUE)
  }
}

Room for Improvement?

So this does work well, but looking back at the clipMask function we can see that it reads in the same raster multiple times during the process.

clipMask
## function(path, extent){
##   ## path is a full file path to a raster object
##   ## extent is a spatial object (in SpatVect format) which will be used to clip and mask the raster
##   ## returns a raster object that is clipped and masked to the extent object 
##   r1 <- terra::rast(path) %>%
##   terra::crop(y = extent) %>%
##   terra::mask(mask = extent)
## return(r1)
## }

Since we iterate over the counties first, we have to read in each monthly raster three times, once for each county. That’s not that big of a deal, because these rasters are relatively small images, but if they were huge, we would be waiting three times more than we need to.

Let’s adapt the function so that we can rework the workflow and only read in monthly images once.

Reduce the Number of Raster Reads by 3

We wrote the first clip mask function in the script that we are using to run the code, but you can also write functions as standalone scripts and source() them into your code. That way, you can keep your primary work flow clean and still have your functions to rely on. So let’s source this new function (saved as a script in the intermediate lesson folder) and take a look at it.

source("intermediateGeospatialR/src/section1/clipMask2.R")
print(clipMask2)
## function (raster, extent) 
## {
##     if (terra::ext(raster) < terra::ext(extent)) {
##         print("The raster may be smaller then the extent object")
##     }
##     if (terra::crs(raster) != terra::crs(extent)) {
##         return("The crs of the objects to not overlap")
##     }
##     else {
##         return(raster %>% terra::crop(y = extent) %>% terra::mask(mask = extent))
##     }
## }

The output of this function is the same as our current one, but there are three major differences.

1. Input a raster object, not a path to an image.

function(raster, extent){
  # raster : a raster object 
  # extent : a spatial feature or extent object 
}

Rather then reading the raster file in within the function, we are passing a raster object to the function. This is the key step for reducing the number of times we need to read in data within our workflow.

2. Testing the extent of the objects.

We build in a condition that tests if the raster is larger than the feature it is being clipped and masked to. This is a just a concept check, but a great one if you’re planning on using this workflow in multiple different projects.

 if(terra::ext(raster) < terra::ext(extent)){
    print("The raster may be smaller then the extent object")
  }

3. Testing the CRS of the objects.

If the objects are not in the same CRS we cannot perform the spatial analysis.

 if(terra::crs(raster) != terra::crs(extent)){
    return("The crs of the objects to not overlap")
  }else{
    return(raster %>%
      terra::crop(y = extent) %>%
      terra::mask(mask = extent))
  }

Restructure the Workflow for the New Function

As the clipMask2() function requires a raster object as the input, we need to restructure our for loops to account for this change. The goal here is to limit the number of times we need to read in the raster objects, so let’s start by looping over all twelve images.

for(i in seq_along(radiance)){
  r1 <- terra::rast(radiance[i]) # read in the raster
  nameR <- names(r1) # save the name as a variable to use in the file structure later
}

We still need our extent object to run the function, so let’s next loop over that feature.

for(j in seq_along(counties)){
  c1 <- counties[j,] # select a specific county 
  nameC <- as.character(c1$NAME) # grab the name of the county 
  r2 <- clipMask2(raster = r1, extent = c1) # call the function with input from the loop above
  ### we've already written this content out so we don't need to repeat the process
  # raster::writeRaster(x = r2,filename = paste0(baseDir, "/data/nightLights/", nameC,"/",nameR,".tif"))
}

Now we can combine the loops.

# loop over images 
for(i in seq_along(radiance)){
  r1 <- terra::rast(radiance[i])
  nameR <- names(r1)
  # loop over counties 
  for(j in seq_along(counties)){
    c1 <- counties[j,]
    nameC <- as.character(c1$NAME) 
    r2 <- clipMask2(raster = r1, extent = c1)
    ### we've already writen this content out so we don't need to repeat the process. 
    # raster::writeRaster(x = r2,filename = paste0"intermediateGeospatialR/data/nightLights/", nameC,"/",nameR,".tif"))
  }
}

This is great. We get to the same result but using a different path that is more efficient because we are reading in less data.

Closing Thoughts

We wanted to walk through this as an iterative process because this is how creating a workflow actually works. You don’t get it right the first time, and you usually don’t get it right the second or third time. Good code is good because you come back to it, you alter it, and you make it better over time.

Prep the Census Data

The rasters are prepped for 12 months for every county of interest. Now we need to work on our census data so that we can associate average nighttime radiance with each one of these census tracks.

head(censusT)
##         GEOID                                      NAME   variable estimate
## 1 48029190601 Census Tract 1906.01, Bexar County, Texas B01002_001     31.3
## 2 48029190601 Census Tract 1906.01, Bexar County, Texas B17001_002    617.0
## 3 48201311900   Census Tract 3119, Harris County, Texas B01002_001     35.7
## 4 48201311900   Census Tract 3119, Harris County, Texas B17001_002    403.0
## 5 48201450200   Census Tract 4502, Harris County, Texas B01002_001     39.6
## 6 48201450200   Census Tract 4502, Harris County, Texas B17001_002     63.0
##     moe
## 1   3.7
## 2 266.0
## 3   3.5
## 4 166.0
## 5   2.4
## 6  50.0
dim(censusT)
## [1] 2406    5
length(unique(censusT$GEOID))
## [1] 1203

Restructuring the Census Data

Each census tract has two variables associated with it. Because of how we pulled the data, these appear as separate rows, which duplicates all of our geometries. One way to fix this would be to filter by our two variables, creating two separate datasets, specify variable names and then join them back together.

# Grab unique variables. 
vals <- unique(censusT$variable)
### B01002_001 == median age 
### B17001_002 == poverty

Say you wanted to filter out just the median age data, whose variable ID is ‘B01002_001’, how would you do that?


Answer

There are a few ways to approach it using base R or dplyr.

#Base R 
c1 <- censusT[censusT$variable == 'B01002_001', ]

# dplyr
c1 <- st_as_sf(censusT) %>% #dplyr only works with 'sf' objects, so we must first convert our 'spatvect' object
  dplyr::filter(variable == 'B01002_001')


There is a more efficient way we can clean this data that does not involve creating intermediate datasets. This would be to transpose the data, which can be done in the tidyr package with the pivot_wider() and pivot_longer() functions.

censusData <- st_as_sf(censusT) %>% # convert back to sf to perform dplyr and tidyr operations
  as.data.frame() %>% # treat as data frame to perform tidyverse functions
  tidyr::pivot_wider(names_from = variable, values_from = c('estimate', 'moe')) %>% 
  dplyr::rename(medianAge_estimate = estimate_B01002_001, medianAge_moe = moe_B01002_001,
                poverty_estimate = estimate_B17001_002, poverty_moe = moe_B17001_002) %>%  #rename columns to something more reader-friendly
  st_as_sf() #convert back to sf object

qtm(censusData)

Now we have a cleaned dataset with a single row for each GEOID and mroe descriptive variable names. When we click on a census tract we can see the values for median age and poverty in the pop-up.

Associate Nightly Radiance with Each Census Tract

Our goal is to end up with a single measure of radiance per census tract per month. This will require some iterative processes, so just like before when we were developing the method for processing the rasters, we want to create a test set first and make sure we get our methodology set before scaling up to an iterative workflow.

## create a subset to test the process. 
tempr <- terra::rast("intermediateGeospatialR/data/nightLights/Harris/june_10arc.tif")
head(censusData) # look for harris county locations (to match our temp raster) 
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -98.52602 ymin: 29.46644 xmax: -95.31069 ymax: 30.00429
## Geodetic CRS:  WGS 84
## # A tibble: 6 x 7
##   GEOID       NAME                    geometry medianAge_estim~ poverty_estimate
##   <chr>       <chr>              <POLYGON [°]>            <dbl>            <dbl>
## 1 48029190601 Censu~ ((-98.52602 29.46795, -9~             31.3              617
## 2 48201311900 Censu~ ((-95.33319 29.72589, -9~             35.7              403
## 3 48201450200 Censu~ ((-95.59029 29.78492, -9~             39.6               63
## 4 48201450400 Censu~ ((-95.62377 29.78472, -9~             28.8              830
## 5 48201554502 Censu~ ((-95.65325 29.99056, -9~             40.6              203
## 6 48201550603 Censu~ ((-95.48031 29.96432, -9~             30.3             1404
## # ... with 2 more variables: medianAge_moe <dbl>, poverty_moe <dbl>
# pull a subset 
tempc <- censusData[2:4,]

Extract Values to Polygons

The terra::extract() function is a marvelous tool. We can pass a raster and a spatial object and it will pull the values from the raster that intersect with the spatial object.

The function returns all extracted values as a numeric vector tied to an ‘ID’ related to the spatial features

# extract values to a vector 
terra::extract(x = tempr, y = vect(tempc)) ## remember sf objects must be converted to `spatvect` objects to use in terra functions

Since there are multiple values for each census tract, we can call a summarizing function within the extract() function. This is great when you are looking for a single value like the mean.

## average the radiance values within each spatial feature
tempc$june <- terra::extract(x = tempr, y = vect(tempc), fun = mean) 

Comparing Census Data to Radiance

Our census data is across three counties. Since our workflow is per county, we will need to subset our census tract data by county name.

View(censusData)

We can do this using filter() from the dplyr package and str_detect() from the stringr package, which detects specific character strings.

censusHarris <- censusData %>% 
  filter(str_detect(NAME, 'Harris'))

qtm(censusHarris)

We now have only the census tracts associated with the specific county.

Creating the Workflow

Let’s outline what needs to happen.

# for each county 
## select census tracts of interest 
## pull all imagery associated with that county
# for each image 
## extract the radiance values for each census tract
## store that data 

The first loop will look very similar to what we have done previously.

# loop over counties 
for(i in seq_along(counties)){
  # get county name
  cName <- as.character(counties$NAME[i])

  ## subset the census data connected to a specific county
  c1 <- censusData %>% 
    filter(str_detect(NAME, cName)) 
  
  # construct a file directory to show where to look for specific county images 
  
  dir <- paste0("intermediateGeospatialR/data/nightLights/",cName)
  
  ## pull all rasters from a county of interest 
  rasters <- list.files(path = dir, pattern = ".tif", full.names = TRUE)
  
}

We have our images and filtered census tract data so let’s work on the second loop.

for(j in seq_along(rasters)){
    print(j)
    r2 <- terra::rast(rasters[j]) # read in image
    n1 <- names(r2)
    # assigning column based on raster name and calculate mean value per area 
    c1[, n1]  <- terra::extract(x = r2, y = vect(c1), fun = mean)[,2] # extract returns a matrix, so we are indexing the column so we can store the data as a vector within the sf object. 
  }

Since this workflow is creating 3 dataframes (one for each county), we need to store those at the end of each iteration as ‘c1’ becomes overwritten with the new county data at the beginning of the next iteration. A good way to do this is to save each output as an element in a list, and then you can combine elements of the list into a single dataframe afterwards.

# create an empty list
df <- list()

Now we can add all the sections together.

# loop over counties 
for(i in seq_along(counties)){
 cName <- as.character(counties$NAME[i])

  ## subset the census data connected to a specific county
  c1 <- censusData %>% 
    filter(str_detect(NAME, cName)) 
  
  # construct a file directory to show where to look for specific county images 
  
  dir <- paste0("intermediateGeospatialR/data/nightLights/",cName)
  
  ## pull all rasters from a county of interest 
  rasters <- list.files(path = dir, pattern = ".tif", full.names = TRUE)
  
  for(j in seq_along(rasters)){
    print(j)
    r2 <- terra::rast(rasters[j]) # read in image
    n1 <- names(r2)
    # assigning column based on raster name and calculate mean value per area 
    c1[, n1]  <- terra::extract(x = r2, y = vect(c1), fun = mean)[,2] # extract returns a matrix, so we are indexing the column so we can store the data as a vector within the sf object. 
  }
  ## condition to compile the datasets, outside of the "j" loop
  df[[i]] <- c1
}

# combine the data for each county into a single dataframe
df <- bind_rows(df)
# check the output
# View(df)

Are Night Lights and Social Ecomonic Values Correlated?

Now to start getting at the question we were initially interested in. Since this dataset did take a bit of time to put together, we can write out the feature to ensure we maintain it.

sf::write_sf(df, "intermediateGeospatialR/outputs/censusNightLightRadiance.shp")

Evaluating the Average Yearly Radiance

The ACS census data is representative of one year: 2015. Our night lights data is from the year 2019. While there might be some interesting relationships at the monthly level, for the first looks let’s just focus on a yearly operation.

# calculate the yearly average radiance for each feature and add a new column
df <- df %>% 
  as.data.frame() %>% 
  mutate(averageRadiance = rowMeans(dplyr::select(., april_10arc:september_10arc), na.rm = TRUE)) %>% 
  st_as_sf()

We have our average values generated for each census tract so let’s plot them and see how things look.

### We now have an average yearly radiance for each location, so let's plot the relationship between radiance and our socioeconomic variables
library(ggplot2)

age <- ggplot(df, aes(medianAge_estimate, averageRadiance)) +
  geom_point() +
  stat_smooth()

poverty <- ggplot(df, aes(poverty_estimate, averageRadiance)) +
  geom_point() +
  stat_smooth()

Looking at the age plot

age

There doesn’t appear to be a strong relationship. What about poverty?

poverty

The relationship between night light radiance and poverty looks a little stronger than age, with higher poverty estimates in areas with higher radiance.

Lastly, lets look at these relationships spatially.

# create a facet map to show the census tracts. 
tm_shape(df) +
    tm_polygons(c("averageRadiance", "medianAge_estimate", "poverty_estimate")) +
    tm_facets(sync = TRUE, ncol = 3)


What About the Monthly Averages?

Disheartened by the weak relationships present in your revolutionary study, maybe we should have first considered the quality of our raw data. If we go back to the beginning of the lesson, we know that cloud cover affects the number of observations within a given month. It’s possible that we should exclude some months from the analysis because of the limited number of observations that went into the mean. Let’s first look at the variance of the average monthly values for each location.

# use rowise() and c_across() from dplyr to calculate standard deviation 
df <- df %>% 
  as.data.frame() %>% 
  rowwise() %>% 
  mutate(varianceRadiance = sd(c_across(april_10arc:september_10arc), na.rm = TRUE)) %>% 
  st_as_sf()

This gives us enough information to suggest it’s worthwhile to evaluate the quality of the monthly images before going back to look at potential correlations. We’ll be looking at this in the next lesson.